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Abstract 

I consider the uncertainties in parton distributions and the consequences for 
^ \ hadronic cross-sections. There is ever-increasing sophistication in the rela- 

O ' tionship between the uncertainties of the distributions and the errors on the 

experimental data used to extract them. However, I demonstrate that this un- 
certainty is frequently subsumed by that due to the choice of data used in fits, 
q ' and more surprisingly by the precise details of the theoretical framework used. 

^ ■ Variations in heavy flavour prescriptions provide striking examples. 

^— i . 1 Introduction 

When calculating cross-sections for scattering processes involving hadronic particles one requires de- 
tailed knowledge of the input parton distributions. The uncertainties in the latter propagate into the 
uncertainties on the former, and are often significant and sometimes dominant. The parton distribu- 
tions can be derived within QCD using the Factorization Theorem, i.e. the cross-section for a physical 
J . cross-section at the LHC can be written in the factorised form 

^ ' i j 

up to small corrections, where P represents some arbitrary process with hard scale (e.g. particle mass, 
jet Et, ...)■ The coefficient functions CfAx\,X2, a s (M 2 )) describing the hard scattering process of the 
two incoming partons are process dependent but calculable as a power-series in a>s(M 2 ). The fi(x, M 2 ) 

t— I ' are the parton distributions - heuristically the probability of finding a parton of type i carrying a fraction 

x of the momentum of the proton. The parton distributions are not calculable from first principles, but 
evolve with M 2 in a perturbative manner governed by the splitting functions Pij(x, a s (M 2 )) which are 
calculable order by order in perturbation theory. Hence, once measured at one scale the distributions can 

^ ■ be predicted at other scales. 

In this article I will briefly review the extraction of the parton distributions and the resulting un- 
certainties. This is an update of a previous article in this series of Workshops [1], so I will concentrate on 
new developments. A full discussion of fitting procedures and uncertainties due to experimental errors 
on the input data is found in [1], but I will very briefly restate the essentials, including some updates. 

There are a variety of sets of parton distributions which are obtained by a comparison to all avail- 
able data (so-called global fits) |f2l|3l or to smaller subsets of mainly structure function data |U[5]|6), 
sometimes only in the nonsinglet sector QUI. All follow the same general principle. The fit usually 
proceeds by starting the parton evolution at a low scale Qq and evolving partons upwards (sometimes 
also downwards) using fixed order evolution equations. The default has long been next-to-leading order 
(NLO), but the next-to-next-to-leading order (NNLO) splitting functions were recently calculated O, 
and sets of NNLO distributions are also available ifTTl ITOl . In principle, there are 11 different parton 
distributions (assuming isospin symmetry and ignoring the top quark) - the 5 quarks, up, down, strange, 
charm, and bottom and their antiquarks, and the gluon distribution. Until recently these were not all 
considered independent, but there is now some evidence for asymmetry between strange quarks and an- 
tiquarks lTT3l . and moreover all quarks evolve slightly differently from their antiquarks due to evolution 
effects which begin at NNLO. However, in practice m c , m^ ^> Aqcd, so the heavy parton distributions 
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Fig. 1: The best value of <rw and the uncertainty using A\ 2 = 1 for each data set in the CTEQ fit (left) and the 
90% confidence limits for each data set as a function of \/ A\ 2 — 100 for one particular eigenvector 



are usually determined perturbatively and there are 7 independent input parton sets, each parameterised 
in a particular form, e.g. 

xf(x,Q 2 )=A{l-x) v (l + ex°- 5 + 1 x)x s . (2) 

The partons are constrained by a number of sum rules: i.e. conservation of the number of valence up and 
down quarks, zero number asymmetry for the other quarks and the conservation of the momentum carried 
by partons. The last is an important constraint on the form of the gluon, which is only probed indirectly. 
In determining partons one needs to consider that not only are there many different distributions, but there 
is also a wide distribution of x from 0.75 to 0.00003. One needs many different types of experiment for 
full determination, as discussed in (T). For instance, the MRST (now MSTW [14]) group use 29 different 
types of data set. 

The quality of the fit is determined by the x 2 of the fit to data, which may be calculated in various 
ways. The simplest is to add statistical and systematic errors in quadrature, which ignores correlations 
between data points, but is sometimes quite effective. Also, the information on the data often means that 
only this method is available. More properly one uses the full covariance matrix which is constructed as 



Cij 



dua. 



tj u i,stat 
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PijVkjVkj, 



fc=l 



X 2 = EEto -T l {a))C^{D 3 -7»), (3) 



where k runs over each source of correlated systematic error, p\- are the correlation coefficients, N is the 
number of data points, Di is the measurement and T\ (a) is the theoretical prediction depending on parton 
input parameters a. An alternative that produces identical results if the errors are small is to incorporate 
the correlated errors into the theory prediction 



fi(a, s) = Ti(a) + ^2 SfcAjfc, 



X 



N 



Di - fi(a,s) 



a"; 



+ T, s l 



(4) 



fc=l i=\ v ' ' fe=l 

where A^ is the one-sigma correlated error for point i. One can solve analytically for the s^ (T5J. 

Having defined the fit quality there are a number of different approaches for obtaining parton 
uncertainties. The most common is the Hessian (Error Matrix) approach. One defines the Hessian matrix 

by __ 

= A X 2 = £ H iM - «f)(«i - 4 0) )- < 5 ) 



X 



x 2 



'■J 



One can then use the standard formula for linear error propagation: 

(AF) 2 



^ X Z^ f) n , ^ h * 



i -J 



dcij 



(6) 



This has been used to find partons with errors by HI [0 and Alekhin [4]. In practice it is problematic due 
to extreme variations in Ax 2 in different directions in parameter space. This is improved by finding and 
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Fig. 2: Comparison of the benchmark gluon distributions and dy distributions 



rescaling the eigenvectors of H, a method developed by CTEQ iflfjlfTTl . and now used by most groups. 
The uncertainty on a physical quantity is 



(Aff^^Vnt 1 )) 



(-)w 2 



(7) 



where Sj and S!> are PDF sets displaced along eigenvector directions by the given Ax 2 . 

One can also investigate the uncertainty on a given physical quantity using the Lagrange Multiplier 
method, first suggested by CTEQ [15] and also used by MRST lfT8ll . One performs the global fit while 
constraining the value of some physical quantity, i.e. minimise 



*(\a)=x 2 global (a) + \F(a) 



(8) 



for various values of A. This gives the set of best fits for particular values of the parameter F(a) without 
relying on the quadratic approximation for Ax 2 , but has to be done anew for each quantity. 

In each approach there is uncertainty in choosing the "correct" Ax 2 . In principle this should be 
one unit, but given the complications of a full global fit this gives unrealistically small uncertainties. This 
can be seen in the left of Fig. Q] where the variation in the predictions for a\y using Ax 2 = 1 for each 
data set has an extremely wide scatter compared to the uncertainty. CTEQ choose Ax 2 ~ 100 ifTSI . 
The 



o confidence limits for the fits to the larger individual data sets when \J A.\ 2 in the CTEQ fit is 
increased by a given amount are shown in the right of Fig.[TJ As one sees, a couple of sets may be some 
way beyond their 90% confidence limit for Ax 2 = 100. The MRST/MSTW group chooses Ax 2 = 50 
to represent the 90% confidence limit for the fit. Other groups with much smaller data sets and fewer 
complications still use Ax 2 = 1- 

There are other approaches to finding the uncertainties. In the offset method the best fit is obtained 
by minimising the x 2 using only uncorrected errors. The systematic errors on the parton parameters a>i 
are determined by letting each Sk = ±1 and adding the deviations in quadrature. This method was used 
in early HI fits |fl9l and by early ZEUS fits [20], but is uncommon now. There is also the statistical 
approach used by Neural Network group [8]. Here one constructs a set of Monte Carlo replicas a k (pi) 
of the original data set a data (pi) which gives a representation of P[a(pi)] at points pi. Then one trains 





Fig. 3: Comparison of the benchmark gluon and dv distribution with the corresponding MRST2001E partons 
a neural network for the parton distribution function on each replica, obtaining a representation of the 

(net)(k) 

pdfs q\ . The set of neural nets is a representation of the probability density - i.e. the mean no and 

deviation ao of an observable O is given by 
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One can incorporate full information about measurements and their error correlations in the distribution 
of a data (pi). This is does not rely on the approximation of linear propagation of errors but is more 
complicated and time intensive. It is currently done for the nonsinglet sector only. 



2 Sources of Uncertainty 

In recent years there has been a great deal of work on the correct and complete inclusion of the ex- 
perimental errors on the data when extracting the partons and their uncertainties. However, to obtain a 
complete estimate of errors, one also needs to consider the effect of the decisions and assumptions made 
when performing the fit, e.g. cuts made on the data, data sets fit and parameterization for the input sets. 

As an exercise for the HERA-LHC [21] workshop, partons were produced from fits to some sets of 
structure function data for Q 2 > 9GeV 2 using a common form of parton inputs at Qq = lGeV 2 . Partons 
were obtained using the rigorous treatment of all systematic errors (labelled Alekhin) and using the 
simple quadratures approach (labelled MRST), both using Ax 2 = 1 to define the limits of uncertainty. 
This benchmark test is clearly a very conservative approach to fitting that should give reasonable partons 
with bigger than normal uncertainties. As seen in Fig. |2] there are small differences in the central values 
and similar errors, i.e. the two sets are fairly consistent. It is more interesting to compare the HERA- 
LHC benchmark partons to partons obtained from a global fit [18], where the uncertainty is determined 
using Ax 2 = 50. There is an enormous difference in the central values, sometimes many a, as seen in 
Fig. [3j although the uncertainties are similar using Ax 2 = 1 compared to Ax 2 = 50 with approximately 
twice the data. Moreover, as(M^) = 0.1110 ± 0.0015 from the benchmark fit compared to as(M|) = 
0.119 ± 0.002. Something is clearly seriously wrong in one of these analyses, and indeed partons from 




Fig. 4: Comparison of MRST and Alekhin NNLO gluon distributions at high x 



the benchmark fit fail when compared to most data sets not included. This implies that partons should be 
constrained by all possible reliable data. 

The benchmark partons above are not a realistic set of partons, but similar examples are found 
when comparing different sets of published parton distributions. For example, the valence quarks ex- 
tracted from the nonsinglet analysis in [7] (see Figs. 9 and 10) are different from a variety of alternatives 
by much more than the uncertainties. Indeed, various gluon distributions, all obtained by fitting to small 
x HERA data 01221 are very different despite what is meant to be the main constraint on the data being 
the same in each case. It is particularly illustrative to look at the difference in the high-x gluons of MRST 
and Alekhin in Fig. [4] This is for NNLO, but is similar at NLO. Here the difference above x = 0.2 is 
a large factor, and very much bigger than each uncertainty (calculated using Ax 2 = 1 for Alekhin and 
Ax 2 = 50 for MRST.) It seems that the HERA data require a gluon distribution for the very best fit 
which is incompatible with the Tevatron jet data [23 ], and the standard error analysis does not accommo- 
date this. As a further point, at NNLO one of the few hard cross-sections required in a global fit which 
is not fully known is that for the jet cross-section. It might be argued that one should leave the data out 
rather than rely on the NLO hard cross-section, as done by MRST. However, this correction is very likely 
to be ~ 5%, whereas the change in the gluon distribution if the data are left out can be > 100%. This 
implies, to the author at least, that it is better to include a data set relying on a slight approximation than 
to leave it out and obtain partons which are completely incompatible with it. 

Even when similar data sets are fit, there can still be significant differences in parton distributions 
and their predictions.The prediction for aw at NLO at the LHC using CTEQ6.5 partons is 202 ± 9 nb 
and using MRST04 partons is 190 ± 5 nb. This is despite the rather similar data sets and procedures used 
in the two fits. The different predictions are easily explained by looking at the left of Fig. [5] The CTEQ 
gluon is much bigger than MRST at small x and drives quark evolution to be larger. This difference 
is not fully understood but is probably partially due to the fact the MRST have lower Q 2 cuts on the 
structure function data, and also due to the different input parameterisations for the gluon. MRST allow 
their gluon to be negative at small x at input (Q 2 , = lGeV 2 ) while the CTEQ gluon is positive at small 
x input (Qq = 1.69GeV 2 ), but is very small indeed. (Further analysis suggests a slightly negative input 
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Fig. 5: The MRST gluon distribution with percentage uncertainties, and the central CTEQ distribution (left) and 
the uncertainties on the MRST, CTEQ and Alekhin gluon distributions at Q 2 = 5GeV 2 (right) 



gluon is preferred, but only barely IU2I so the freedom is not introduced.) 

The parameterization can have an even more dramatic effect on the uncertainty than on the central 
value. The uncertainties on the gluon distributions for MRST, CTEQ and Alekhin are shown in the right 
of Fig. [51 One would expect the uncertainty to increase significantly at very small x as constraints die 
away. This happens for the MRST gluon. The Alekhin gluon does not have as much freedom, but is 
input at higher scales and behaves like x~ x at small x. The uncertainty is due to the uncertainty in A (the 
situation is similar for ZEUS and HI partons). The CTEQ input gluon behaves like x x at small x where 
A is large and positive. The small-x input gluon is tiny and has a very small absolute error. At higher Q 2 
all the uncertainty is due to evolution driven by the higher-x, well-determined gluon. The very small x 
gluon no more uncertain than at x = 0. 01 — 0.001. 

Another important source of uncertainty only now becoming clear is due to the strange distribution. 
Until recently this was taken to be a fixed and constant fraction of the total sea quark distribution. This did 
not allow any intrinsic uncertainty on the strange quark. It is now being fit more directly by comparison 
to dimuon data in neutrino scattering [13]. In the MSTW fits lfl4ll this results in an increased uncertainty 
on all sea quarks since allowing the strange to vary independently gives the up and down quarks more 
freedom. CTEQ have produced specific parton sets with fits to the strange quark ll24ll . and in Fig. [6] we 
see predictions from these for production of W + + c. CTEQSO represents the best fit when the strange 
is fit directly. Worryingly, this can be outside the uncertainty band for the default set. 



3 Theoretical Uncertainties 

Even if we had an unambiguous definition for the parameterization and the data sets and cuts used, there 
would still be additional uncertainties due to the limited accuracy of the theoretical calculations. The 
sources of theoretical error include higher twist at low scales and higher orders in as, and it now seems 
likely that there may be sizable corrections from higher order electroweak corrections at the LHC (see 
e.g. |[25l "). due to aw ln 2 (i? 2 /M w ) terms in the expansion. The higher order QCD errors are due not only 
to fixed order corrections, but also to enhancements at large and small x because of terms of the form 
a"ln n_1 (l/x) and a"ln 2n_1 (l — x) in the perturbative expansion. This means that renormalization 



A 300 



p p -W + c X /^~ 




LHC //^^ 


^\\ Strangeness 




' — "voS^X Series 




nn^\ (decreasing 




\\\ cross section) 




V^, S2 




YA S4 




v\ SO 




M\ CTEQ6.5M 




v\ si 




m S3 




Fig. 6: Uncertainty of predictions for W + 



1.25 



0.75 




0.5 - 



0.25 



Fig. 7: Comparison between the NLO and NNLO up quark distribution 



3.0 



mi 1 1 1 — rn — i 1 1 1 1 — i — i — i — i i i|| 

u at fi = 2 GeV 




a-±6o-poF in units of o-(CTEQ65M) 
LHC, NLO, PRELIMINARY 



W" 

Z° 

gg^H°(120) 

W + H(120) 

W"H(120) 

sc^H + (200Hbt 



.01 -02 05 



^ 5 .6 ,r .8 -« i 



Knnlo i 




CTEQ6.5 
CTEQ6.1 
IC-Sea 



0.85 0.9 0.95 1 1.05 1.1 1.15 1.2 1.25 
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opposed to CTEQ6 (right) 

and factorization scale variation are not a reliable way of estimating higher order effects because a scale 
variation at one order will not give any indication of an extra ln(l/x) or ln(l— x) at higher orders. Hence, 
in order to investigate the true theoretical error we must consider some way of performing correct large 
and small x resummations, and/or use what we already know about going to higher orders. 

We are now able to look at the size of the corrections as we move from NLO to NNLO. The up 
quark distribution at the two orders is illustrated in Fig. [7] As one can see, the change in the central 
value is somewhat larger than the uncertainty due to the experimental errors. The predictions for various 
physical processes have been calculated. The change for quark-dominated processes, such as W and Z 
production, is not very large, i.e. 4% or less [26], but is sometimes bigger than the quoted uncertainty 
at each order. Changes in gluon dominated quantities, such as Fl(x,Q 2 ), can be much larger (27). 
Similarly there are implications that resummations may have significant effects on LHC predictions, 
particularly at high rapidity [28]. 

Very recently it has become clear that a less obvious source of theoretical errors can have surpris- 
ingly large effects, i.e. the precise treatment of heavy quark effects. For many years CTEQ have had 
a procedure for extrapolating from the limit where quarks are very heavy to the limit where they are 
effectively massless, i.e. a general-mass variable flavour number scheme (GM-VFNS) |29l . Neverthe- 
less, they have chosen the scheme where the quark masses are zero as soon as the heavy quark evolution 
begins, i.e. zero-mass variable flavour number scheme (ZM-VFNS), to be the default parton set. In the 
most recent analysis [3 ] they have switched to the GM-VFNS definition as default and noticed that this 
has a very large effect on their small- x light quark distributions, mainly determined by fitting to HERA 
data, where mass corrections are important, and on LHC predictions. This is shown in Fig.[U where one 
sees the prediction for ayy increase by 8%. 

Perhaps even more surprising is the change observed by MRST at NNLO. Because early approx- 
imate "NNLO" sets (e.g. G6lD were based on approximate splitting functions the MRST group used a 
(fully explained) approximate treatment of heavy quarks at NNLO, in particular not including the dis- 
continuities at transition points that occur at this order QUI . The correction of this approximate NNLO 
VFNS between [2] and iflOll using the scheme in [31] led to large corrections to the gluon distribution 
at small x and by evolution, also to the light quark distributions at higher scales, as seen in Fig. [9] This 
results in the corrections to LHC cross-sections shown in Table 1, i.e. up to 6%. In this case the change 
in procedure was less dramatic than that for the CTEQ6.5 result, where the original approximation was 
of massless quarks, and was also at one order lower. The size of the change was certainly unexpected. 
It is important to note that in both these cases the change is not really representative of an uncertainty, 
since each represents a correction of something that was known to be wrong. However, in each case the 
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Fig. 9: Comparison of the NNLO gluon distribution (and its uncertainty) with the previous approximate NNLO 
distribution at Q 2 = 5 GeV 2 (left), and the ratio at Q 2 = 10 4 GeV 2 for both the gluon and the up quark (right). 

Table 1: Total W and Z cross-sections multiplied by leptonic branching ratios at the Tevatron and the LHC, 
calculated at NNLO using the updated NNLO parton distributions. The predictions using the 2004 NNLO sets are 
shown in brackets. 



B iv ■ a w (nb) B l+l - ■ a z (nb) 



Tevatron 
LHC 



2.727 (2.693) 
21.42 (20.15) 



0.2534 (0.2518) 
2.044(1.918) 



"wrongness" was thought to be an approximation requiring only a small correction, an expectation that 
was optimistic. Some parton sets currently available are still extracted using similar (or worse) "approx- 
imations", and even in the best case the limited order of the calculation means that everything is to some 
extent an approximation, with the size of the correction being by definition uncertain. 



4 Conclusions 

One can determine the parton distributions from fits to existing data and predict cross-sections at the 
LHC. The fit quality using NLO or NNLO QCD is fairly good. There are various ways of looking 
at uncertainties due to the errors on data. For genuinely global fits, using Ax 2 = 1 is not a sensible 
option due to incompatibility between data sets and possibly between data and theory. Uncertainties 
due to parton distributions from experimental errors lead to rather small, ~ 1 — 5% uncertainties for 
most LHC quantities, and are fairly similar for all approaches. However, sometimes the central values 
using different sets differ by more than this. The uncertainties from input assumptions, e.g. cuts on data, 
sets used, parameterisations etc. , are comparable and sometimes larger than statistical uncertainties. In 
particular, the detail of uncertainties on the flavour decomposition of the quarks is still developing. 

Uncertainties from higher orders/resummation in QCD are significant, and electroweak corrections 
are also potentially large at very high energies. At the LHC measurement at high rapidities, e.g. W, Z, 



would be useful in testing our understanding of QCD. Our limited knowledge of the theory is often 
the dominant source of uncertainty. There has recently been much progress: more processes known 
at NLO, and some at NNLO; improved heavy flavours treatments; developments in resummations etc.. 
In particular, essentially full NNLO parton distribution determinations are now possible. But further 
theoretical improvements and complementary measurements are necessary for a full understanding of 
the best predictions and their uncertainties. 
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